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ABSTRACT 

Context. The first step when investigating time varying data is the detection of any reliable changes in star brightness. This step is 
crucial to decreasing the processing time by reducing the number of sources processed in later, slower steps. Variability indices and 
their combinations have been used to identify variability patterns and to select non-stochastic variations, but the separation of true 
variables is hindered because of wavelength-correlated systematics of instrumental and atmospheric origin, or due to possible data 
reduction anomalies. 

Aims. The main aim is to review the current inventory of correlation variability indices and measure the efficiency for selecting 
non-stochastic variations in photometric data. 

Methods. We test new and standard data-mining methods for correlated data using public time-domain data from the WFCAM 
Science Archive (WSA). This archive contains multi-wavelength calibration data (WFCAMCAL) for 216,722 point sources, with 
at least 10 unflagged epochs in any of five filters (YZJHK), which were used to test the different indices against. We improve the 
panchromatic variability indices and introduce a new set of variability indices for preselecting variable star candidates. Using the 
WFCAMCAL Variable Star Catalogue (WVSC1) we delimit the efficiency of each variability index. Moreover we test new insights 
about these indices to improve the efficiency of detection of time-series data dominated by correlated variations. 

Results. We propose five new variability indices which display a high efficiency for the detection of variable stars. We determine the 
best way to select variable stars using these and the current tool inventory. In addition, we propose an universal analytical expression 
to select likely variables using the fraction-of-fluctuations on these indices (/ fluc ). The / fluc can be used as an universal way to analyse 
photometric data since it displays a only weak dependency with the instrument properties. The variability indices computed in this 
new approach allow us to reduce misclassification and these will be implemented in an automatic classifier which will be addressed 
in a forthcoming paper in this series. 

Conclusions. 

Key words. Astronomical instrumentation, methods and techniques - Methods: data analysis - Techniques: photometric - Astro¬ 
nomical data bases - Astronomical databases: miscellaneous 


1. Introduction 

The tremendous development in astronomical instrumentation 
and automation during the last few decades has given rise to sev¬ 
eral questions about how to analyse and synthesize the growing 
amount of data. Recently, various dedicated telescope systems, 
both on the ground and in space, have been used for wide-held 
shallow, low resolution, multi-epoch, imaging surveys, scanning 
the sky in different wavebands with aims ranging from com¬ 
prehensive ste llar variability searc hes to e xoplanet huntin g e.g. 
(PanSTAR RS. iKaiser et al.l (l2002h :QGLE. llJdalskil (120031): SU ¬ 
PERW ASP. iPollacco et al.1 d2006l) : CoR oT. iBaglin et al.l (12007b : 
NSVS, iHoffman et al.l (l2009h : Kepler. iBorucki et al] (120101) ). 
These data have led to many discoveries in several areas 
of modern astrono my: asteroseismology, exoplanets and stel¬ 
lar evolution (e.g., Huberetak 2012; DeMedeiros et al.l 12013t 


2013j) and the VIS TA Variables in Via Lactea survey (VVV; 


Minniti_et al. |2010|> . are providing a high data flow for a wide 


range of science applications in order to understand the dynam¬ 
ics and stellar variability of the Milky Way galaxy. 


The first step to investigating time varying data is the 
detection of any reliable changes in star brightness (e.g. 

Welch & Stetsonll l 9931 Stetsonll l996l : [Wozniakir2000tlShin et al.l 


IWalkowicz & Basri 20131: Paz-Chinchon et al.l 12 015l). T he next 

gener ation of these surveys, such as Gaia dBailer-Jones et alJ 


20091 : Ferreira Lopes et al.ll2015lf . This step is crucial to decreas¬ 

ing the running time by reducing the number of sources that 
slower steps, such as period finding and classification, are run 
on. The stochastic variations are mainly related to very bright 
sources, caused by saturation of the detector whereby the flux 
within the aperture will bleed out into nearby pixels and the 
measured magnitude becomes dependent on the sky brightness 
and seeing, or very faint sources where the sky noise domi¬ 
nates, providing an increase in the uncertainty of the measure¬ 
ments, and a dependency on the sky brightness and seeing. Vari¬ 
ability indices and their combinations have been used to iden¬ 
tify variability patterns and to select non-stochastic variations 
(e.g.lDamerdii et aTll2007t [Shin et al.ll2009l : lFerreira Lopes et al.l 

[2015 ). but the separation of true variables from noisy data is 
hindered because of wavelength-correlated systematics of instru¬ 
mental and atmospheric origin, or due to possible data reduction 
anomalies. Detection methods have been optimized for specific 
variability signals to detect supernovae, microlensing, transits. 
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and other variable sources (e. 

2 . Alard & Luptonll 

998t Wozniak 

2000; 

Gossl & Riffeseij 2002 

Becker et al. 2004 

; Corwin et al. 

2006; 

Yuan & Akerlof 2008; 

Renner et al. 2008). An important 


step to optimising this process is to review the current inventory 
of variability indices and determine the efficiency level for se¬ 
lecting non-stochastic variations in photometric data. 


The second step is to determine the main periods. There 
are various methods used in astronomy for frequency analysis, 
to name a few: the Deeming method (iDeemind 1 9751). PDM- 


to name a tew: the Deeming me thod (IDeeimna lyvoh, rUM- 
Jurkevich (IStellingwerfj 197f|_ Dunuv & Hoffman Jl985h, string 
length minim ization ( Lafler & Kinman 1 965b StetsonM 1 9961 
IClarkel 120021) . information entr opy dCincottajeLaL . 199 51). the 
analysis of variance (ANOVA, ISchwarzenberg-Czernvl |1996h 
and the Lomb-Scargle and its extension using err or bars dLombl 
1976 : Scarglejll982l : IZechmeister & Kiirsteif2009l) . These meth¬ 
ods are based on the fact that the phase diagram of the light 
curves (LCs) is smoothest when it is visualized using its real 
frequencies. Assessment of the significance of these frequencies 
is a pertinent problem due to non-Gaussianity, multi-periodicity, 
non-periodic variations, and the_manner of how they should be 
taken into account rtSiiveges|l2014l) . From this view the variabil¬ 
ity indices are a fundamental part of the variability analysis in 
order to save running time and decrease the number of miscalcu¬ 
lations in the frequency analysis. The detection of non-periodic 
variables, transients, and other aspects in regard to the signifi¬ 
cance of peaks in a periodogram has not been completely solved 
yet. 


2. Data 

2.1. WFCAMCAL database 

The public WFCAM Calibration (WFCAMCAL - 
iHodgkin et al.1 l2009t ICross et akl l2009h is an unique pro¬ 
gramme that is well fitted to test the panchromatic variability 
indices and our assumptions. This programme contains panchro¬ 
matic data for 58 different pointings distributed over the full 
range in right ascension and spread over declinations of +59?62 
and -24?73. These were used to calibrate the UKIDSS surveys 
[Lawrence et al.l (20071 The pointing closest to the zenith was 
chosen whenever a calibration field was observed. This was 
typically every hour early on in the UKIDSS observations 
and later every 2 hours, with some early nights having many 
additional observations (up to 40 in a night). During each visit 
the fields were usually observed with a sequence of filters, either 
through JHK or ZYJHK filters within a few minutes. This lead 
to an irregular sampling with fields observed again roughly on 
a daily basis, although longer time gaps are common, and of 
course large seasonal gaps are also present in the data set. 

The WFCAMCAL data are archived in the WFCAM Science 
Archive (WSA; lHamblv et al.l 120081) . The data are processed 
by the Cambridge Astronomy Survey Unit (CASU) llrwin et ail 
( 2004 ) and the Wide Field Astronomy Unit (WFAU) in Edin¬ 
burgh, and the latter produce the WSA. The design of the WSA, 
the details of the data curation procedures and the layout of 
the database are described in detail in lHamblv et al.l 120081 and 
ICross et al.ll2009l We use data from the WFCAMCAL08B re¬ 
lease (observations upto the end of UKIRT semester 08B). 


The last point is that the variability classification is intrinsi¬ 
cally related with the determination of reliable periods and de¬ 
termining a set of parameters that allows us to distinguish all 
variability types. Automatic classifiers based on machine learn¬ 
ing have been applied to several large time-series datasets (e.g . 


Wozniak et al.l 

2004; 

Debosscher et al. 2007; Sarro et al. 

2009- 

Blomme et al. 

20 id; 

Richards et al.l 201J Dubath et al. 

2012b- 


The inclusion of periodic and non-periodic features, statistics 
and more sophisticated model param eters have improved au¬ 
tomatic classifiers (e.g. [Richards et afll2011 ). Misclassification, 
fuzzy boundaries between variable stars’ classes, mis-labelled 
training sets, as well as, full proc essin g of terabytes of data are 
current scientific challenges dEverll2006l) . 

The present paper is the first in a series of papers covering 
different aspects of variable star selection and classification. The 
first two articles are related to selection of variable stars using 
variability indices. In this paper, we discuss the selection of vari¬ 
able stars using correlation variability indices, while in the sec¬ 
ond of this series we will discuss non-correlation variability in¬ 
dices; Paper 3 will be about periodicity search methods; Paper 4 
will be about the variable star classifier. In this work, we perform 
a comprehensive stellar variability analysis on time varying data. 
In Sect. 12.11 we describe the data used to compare each index, us¬ 
ing a pre-selected catalogue of known variable stars to test how 
well each index selects these and the efficiency of the selection 
measured by how few additional stars are selected by the same 
cutoff value. In Sect. 0 we present an overview of commonly 
used correlation variability indices and propose 5 new variabil¬ 
ity indices. Next, in Sect. [4] we analyse the limits of correlated 
variability indices as well as proposing a false alarm probability 
for variability indices. We present our results and discussions in 
Sect. [6] Finally, in Sect. [7] we draw our conclusions and discuss 
some future perspectives. 


2.2. The WFCAMCAL Variable Star Catalogue 

iFerreira Lones et all 12015l performed a comprehensive stellar 
variability analysis of the WFCAMCAL database and presented 
the photometric data and characteristics of the identified vari¬ 
able stars as the WFCAM Variable Star Catalogue (WVSC1). 
The authors used standard data-mining methods and introduced 
new variability indices designed for multiband data with corre¬ 
lated sampling. To summarize, the authors performed a careful 
analysis using cutoff surfaces to obtain a preselection with 6651 
stars based on criteria established by numerical tests of the noise 
characteristics of the data. Next they combined four frequency 
analysis methods to search for the real frequencies in the LCs in 
each waveband and in the chromatic LC, i.e. comprised of the 
sum of all broadband filters. Finally, they obtained a ranked list 
of the best periods for each method and selected the very best pe¬ 
riod, which gave the minimum^ 2 in order to cope with aliasing. 
Finally, the authors visually inspected all the phase diagrams of 
the 6651 stars and recovered a catalogue containing 319 stars in 
which 275 are classified as periodic variable stars and 44 objects 
as suspected variables or apparently aperiodic variables. 

In this paper we analyse this same sample from 
IFerreira Lones et all 120151 First, we selected all sources classi¬ 
fied as a star or probable star having at least ten unflagged epochs 
in any of the five filters. This selection was performed from an 
initial database of 216,722 stars. Next we test the efficiency of 
selection of variable stars using the variability indices presented 
in Sect. [3] 

3. Variability Indices 

Table □ summarises 12 variability indices of which 5 are new 
indices proposed in this work. The present work discusses the 
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Table 1. Variability Indices analyses in the present work. The description of terms used in this indices are discriminate in Sect. l3.ll and l3.2l 


Index 


Definition 

Reference 

Iws 


/ 1 viV-l (*n~P\ ( Xn+ 1 -P\ 

\ n-(n-l) ^n=\ \ e„ ) ( e„ + i ) 

Welch & Stetson 1993 

Jws 


I]‘n=i Sign( 6 „ 6 n+ 1 ) i\6 n 6 n+ i\ 

Stetson 1996 

Kws 


WZfciltfil 

Stetson 1996 

Lws 


C Jws ■ K W s) /0.789 

Stetson 1996 

t( s Y 

pfc 

1 yn 
n s A= 1 


Ferreira Lones et al. 2015 1 

t( s ) 

A ti 

0.5-j 


Ferreira Lones et al. 2015 

K n 


Nf 

N s 

the present work 

T C 5 ) 

Pfc 


Gw) 

the present work 

M (s) f 

pfc 


med[Q (s) ] 

the present work 

FL « 


F^ x L (s > 
pfc 

the present work 

FM (S) 


F (i) x M (s) f 
pfc 

the present work 


1. Unfortunately the first version of indices was incorrectly defined. Therefore, the authors have since added an erratum with the correct form.. 


efficiency of selection of each one and discusses the best way 
to select variable stars using the current tool inventory. Sets of 
variability indices have been used, instead of one , to imp rove the 
selection process during the last few years (e.g. Ishin et al.l2009t) . 
Indeed automatic classifiers are also using these parame ters to 
facilitate the classification of variable stars (e.g. iRichards et al.1 
1201 il l. The variability indices are a fundamental tool to improv¬ 
ing all processes of the time domain analysis. 


_ Cu rrently, t he We lch-Setson indices (e.g. IWelch & Stetsonl 

1 1 9931: IStetsonl 1 19961 i.e. Iws, Jws, K W s and L ws indices) 
are found to be significantly more sensitive than the “tradi¬ 
tional” ^ 2 -test for single variance, which uses the magnitu de- 
rms s catter distribution of the data as a predictor fe.su Poimanskil 
120021). The improveme nts proposed by IStetsonl 1 19961 on /we 
( Welch & Stetsonl 1993h and incorporated in the Jws index allow 
us to compare wavebands with different numbers of epochs on 
an equal basis. The author uses the Bessel correction ( s/jtj) to 
reduce the bias related with the sample size despite the index be¬ 
ing the square of the correlation not the mean variance. The Iws 
index was modified, to quantify panchromatic flux correlations, 
to form new variability indices (Izjr ,) by iFerreira Lones et all 
120151 These were the first variability indices developed to anal¬ 
yse panchromatic surveys. Moreover the authors proposed a new 

(s) 

set of flux independent variability indices (/!. ). 


The st atistical period sear ch based in the analysis of variance 
('ANOVA. lSchwarzenberg-Czernvi 19%ll has been used to select 
non-stochastic variations. Nevertheless, this method is limited to 
identification of periodic variations and requires more running 
time once its significance level is determined on phase diagrams 
for each frequency test. Using variability indices we can discrim¬ 
inate non-stochastic variations independently from their nature 
and reduce the running time. The main goal of this work is to de¬ 


termine the best way to select variable stars without computing 
the variability periods. In the follow subsection we summarize 
the 7^ and 7^ variability indices as well as improvements on 
these indices using a new approach. 


3.1. The and /£'’ panchromatic variability indices 

The c urrent tool inventory was added to by IFerreira Lones et al.l 
l20L5l with a new set of variability indices to separate LCs that 
are dominated by correlated variations from those that are noise- 
dominated ones. The authors introduced a new set of variability 
indices designed for multi-band data with correlated sampling 
that included one index that is highly insensitive to the presence 
of outliers in the time-series data. First, the authors extended 7ws 
to create the I p f c index defined as, 



1 

n s 


L 


m—(s— 1) 

f m , ^ 

z- 


7i=l 

Uj=1(j-1) + 1 j 


( 1 ) 


where m is the number of filters, s is the combination type (be¬ 
tween two or more epochs), n s is the total number of correlations, 
and the A (A) correction factor is. 


+1 

if r Uij, > 0 , ■ • 

• , Tuij a > 0 ; 


+1 

if Tu,j x < 0, • • 

■ , Tu ijs < 0 ; 

(2) 

-1 

otherwise. 




and T is given by, 



(3) 
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These indices allow us to compute correlations among s 
epochs. As shown by the authors, increasing the number of cor¬ 
related wave bands (s) makes the separation between correlated 
and uncorrelated variables more evident. Next the authors pro¬ 
posed a new index, /^' , using Eqn [2] that is the sum of discrete 
values 1 or -1. This index is defined as 


2-7 


(i) 


i n 

-i = -y 


z ■ 

71 = 1 


z 

\js~j(s- 1)+1 


A' 


(s) 


(4) 


where 0 < < i, and where A. is defined in Eqn 0. 

Finally the authors propose a general expression to determine 
the probability of a random event leading to a positive 7 fi index. 
In the case of statistically independent events, this is given by. 



/'ey 

On the other hand the expected value of 7 f ' for a random 
distribution is about 0. Meanwhile, the number of sources with 
negative values increase with s, because there is an increase in 
the number of possible combinations that give a negative corre¬ 
lation. 


3.2. Improvements on panchromatic and flux independent 
indices 

The 7*2 variability indices (see Eqn.Q} would work equally well 
if a set of observations are in the same bandpass, if we correlated 
groups of observations observed over a short interval. Therefore 
we need to modify the 7 ( ^ indices to make them still more ro¬ 
bust against different numbers of observations in each group. 
Similarly, we propose a new panchromatic, flux independent, 
variability index (77 fi ) and additionally combine these indices 
to create two new indices. In order to provide an expression to 
be used in multi or single waveband data we propose the follow 
notation: 

1. First, we compute the values of d, give by 



where n x is the number of epochs of waveband x, x, are the 
flux measurements, x is the mean flux and (r xi denotes the 
flux e rrors. This parameter is equal to that used by ]Stetson| 
119961 to improve / U /.s index and according to him the mea¬ 
surements of correlations using 6, allow us to compare data 
from different wavebands with unequal numbers of observa¬ 
tions on an equal basis. 

2. Next, the b, values are computed for all measurements in 
each waveband using the respective values of n x . As a re¬ 
sult we obtain a vector 6 with N measurements collected in 
any waveband. In addition, we save the observation time for 
each 6j value. 

3. We determine the time interval (AT) for which measure¬ 
ments enclosed in this interval will be considered to be at 
virtually the same epoch. The chosen AT value comes from 
the arrangement of epochs and thus gives the minimum pe¬ 
riod (see Sect. ro >. The total number of boxes ( Nbox ) is given 
by T tot /AT, where T to , is the total time spam. The accuracy 
of the index must increase as AT decreases. 


4. Next we compute the value of the variability index of order 
s in the k ,h box; 


Goa) 


ynt-Os-l) y ni: I * 

h k = 1 
0 



if h * j 

if n k < 1 
(7) 


This equation performs all possible combinations without 
repetition among the ///, values. Indeed the total number of 
combinations calculated is given by. 


N box 


»,=z 


n k \ 


k= 1 


[sKnt - s)!] 


( 8 ) 


5. Now, we can express the flux independent indices on a sim¬ 
ple expression given by, 



N + 

S 

N s 


(9) 


where N x are the number of positive correlations according 
to Eqn. [2] K '^ 1 indices include measurements obtained in one 

filter in contrast to 7 fi indices where measurements are ob¬ 
tained in different filters. 

6. Next, we can compute the LV index given by. 


L (s l = 


1 


N bo , 


‘pf c n 


2 2(a)- 


k= 1 


( 10 ) 


where it reduces to J pfc in the case when we only have mea¬ 
surements in different filters. tZ^ can be used to perform 
comparisons on an equal basis between stars with different 
number of epochs as well as using measurements obtained 
in one filter in contrast to J 1 ) 1 and K l ' ] indices. 

prc n 

7. An alternative way to compute the characteristic value of 
correlation is computing the median of correlations, given 
by 


M^ fc = med[Q (s) ], (11) 

where Q( s] encloses all G(a) correlations. The median value 
may provide a more robust value than the mean in the pres¬ 
ence of outliers. 

8. Finally, we can use the K jt in a correction factor related to 
instrument properties and outliers. Such a factor can be de¬ 
fined as, 


F U) = i 2'K ) -/Z) if 

\ 0 otherwise 

where P s is the expected value of pure noise for K^\ f’ (v) 
ranges from 0 to 2 x (1 - 2/s 2 ) providing an increase of its 
weight with s values. For instance, the maximum value of 
is 1 for s - 2 and ~ 1.6 for s - 3. is more efficient 
than because this increases the difference between val¬ 
ues of correlated and uncorrelated data and we concentrate 
the pure noise values about zero. is used for provide a 
new set of indices given by FL/^ and FM ls) (see Table Q]). 
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The variability indices proposed above are determined us¬ 
ing properties related with the magnitude and signal correlation 
values. Stellar variability searches based on such indices follow 
general assumptions: (i) intrinsic stellar variability can be typi¬ 
cally identifiable from analysis of correlation measures observed 
in multiple or single wavebands; (ii) there are a minimum num¬ 
ber of correlations required to discriminate stochastic and non¬ 
stochastic variations (see Sect. 14. Il l; (iii) the interval between any 
2 observations (r) used to compute the correlations must be suf¬ 
ficiently phase-locked (see Sect. 14.31) ; (iv) non-intrinsic varia¬ 
tions will be typically stochastic. Indeed measurements due the 
systematics of instrumental and atmospheric origin, or due to 
possible data reduction anomalies, displaying correlated prop¬ 
erties may decrease the confidence level of variability indices. 
Such measurements are mainly related with temporal saturation 
of bright objects as well as systematic variations in the sky noise 
for faint stars. 



Fig. 1. Efficiency metric (E, al : the ratio of total number of sources in the 
selection to good known variables in WVSC1) using Eqn.[T6]for s = 2 
(black lines) and for s = 3 (grey lines). Solid lines mark Ewvsci , the 
fraction of the good variables in the selection, while the dashed lines 
mark E lol . A good choice of a returns a high fraction of good variables 
Ewvsci f° r a l° w value of the efficiency metric E lol . 


4. Detection limits of correlated variability indices 

The number of measurements and how many measurements are 
‘close’ - i.e. within a time span much shorter than the period of 
any variability - are fundamental information necessary to set 
better variability indices. The number of measurements will de¬ 
termine how stringent the cutoff values must be while the num¬ 
ber of close measurements will set the most appropriate variabil¬ 
ity index. To determine which measurements are close or not we 
need to determine a AT value such that it is a compromise be¬ 
tween the number of correlated measurements and the minimum 
period that we are searching for. For instance, variability indices 
computed in boxes of AT, greater than the period, will return 
values closer to those expected for noise. Lower values of AT 
lead to higher accuracy for variability indices that use correla¬ 
tion measurements. 

Moreover, variability indices can be used in all processes of 
the time-series analysis such as discussed in previous sections. 
Therefore we must find new ways that allow us to increase the 
precision, reliability of these indices and their connexions with 
the different types of variability. In Sect. [3] we described the cur¬ 
rent tool inventory and we proposed new variability indices with 
new correction factors to reduce bias. In the present section we 


propose new ways to increase the precision of these variability 
indices as well as how to evaluate their reliability. 

4. 1. Number of correlated measurements 

The minimum number of measurements that are enough for the 
use of a variability index will be determined by the capabil¬ 
ity of separating variable and non-variable stars. The statistical 
properties like mean, standard deviation, skewness and kurtosis 
will be strongly dependent on the number of measurements. On 
the other hand, we need contemporary (close) measurements to 
use correlated variability indices. These features may change for 
each variability index. 

Consider two cases: one with N s correlated data points and 
the other with N s of pure noise for s = 2. In the case of pure 
noise, the number of positive and negative correlation must be 
the same (N+ = Ab/2), while for correlated data (N* = A^). Us¬ 
ing K^' 1 indices we can determine the minimum number of cor¬ 
relations necessary to separate a purely correlated signal from 
pure noise assuming that there is an uncertainty in the sign of 
some correlations. We assume that this uncertainty (or fluctua¬ 
tion) on the number of positive correlations given by n/ provides 
an increase in the variability index of pure noise and a decrease 
otherwise. So the minimum separation between correlated and 
uncorrelated data is given by. 


AK= 

n 


1 

£ 

'f + n i 

\ N 2 1 

{ n 2 j 


> 0 


/V™'" = 4-n f (13) 


where n/ is an integer with values less than A3/2. The minimum 
number of correlations must be at least 5 according to this rela¬ 
tion, given a single error. The general expression is given by, 


A Kl s) = 

n 


(^) 


P S N S +71/ 
N's 


>0 


N"‘ 


2n f 

1 -P,' 


(14) 


where the minimum value (AK 1 ^') to validate this relation is 
given by. 



A T / P 


Fig. 2. versus AT/P for s = 2 (black lines) and for s = 3 (grey 
lines). The dashed lines mark the expected values for random variation. 


AK ( f = 1 -P s - 
fi N s 


(15) 
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where rif/N s is the fractional fluctuation of positive corre¬ 
lated measures (/a uc ). This equation explains analytically the 
increase in precision as well as the detachment between corre¬ 
lated and non-correlated data with increasing ,v (see Figure 8 of 
iFerreira Lopes et aT1l2015l) . A—> 1 with the increasing s, so 
correlated data becomes more easily separable from pure noise 
with increasing s. 


4.2. False Alarm Probability on variability indices 


The statistical significance of a value is associated with the False 
Alarm Probability (FAP). FAP is the probability that the ob¬ 
served value was caused by random fluctations. The smaller the 
FAP then the larger will be the statistical significance of this 
measurement and the tolerance usually adopted is about 1%. The 
determination of FAP in period searches is hindered due to non- 
Gaussian distributions, observations scattered irregularly over a 
long time span, the unclear meaning of the number of indepen¬ 
dent frequenc ies a nd the_manner in how these should be taken 
into account (ISuvegesll2Q 14l) . 

Additionally, significance values may depend on the function 
employed to make the periodicity search. Therefore, in some 
cases, it indicates the use o f different techniques on different 
types of variable stars dTempletonll2004l) . For instance, the peri¬ 
odicity search methods based on Fourier series will be less sensi¬ 
tive to non-sinusoidal and aperiodi c sig nals. Recent work based 
on the analyse of variance t Schwarzenber g -Czernvl 119961) and 
multiharmonic neriodograms (lBaluevll200 c A 120 13 ) are allowing 


us to assess more complex signals more easily. This process may 
be facilitated if we can first determine whether the time series has 
reliable variability or not. 

We can consider the significance of the variability indices 
by comparing the null hypothesis Hq of the observed time se¬ 
ries (purely noise) against the alternative H\ stating that there is 
no correlated signal in it. One way to evaluate statistical fluctu¬ 
ations on variability indices is to generate a large number of test 
time-series sequences by shuffling the times (“bootstrapping”). 
Following this approach, we are able to keep part of the corre¬ 
lated nature of the noise intrinsic to the da ta, as opposed to_ nu¬ 
merical tests based on pure Gaussian noise (IFerreira Lopes et ail 
120151) . However, we need many iterations to provide accurate 
values for a FAP, i.e. we need to compute the variability indices 
n times. For instance, at least 100 iterations are necessary to get 
a FAP of 0.01 and this implies a longer running time. 

On the other hand an analytical expression of FAP for vari¬ 
ability indices may depend on the deviation from the mean which 
will vary according to the survey analysed since it must depend 
on detector efficiency, number of measurements, magnitude, etc. 
However, has a weak dependence on the properties of the 
survey in which Eqn. |T5] provides a value above which corre¬ 
lated sources may be distinguished from noise. The only term to 
be determined is the fractional fluctuation in positive correlated 
measurements (/n uc = nf/N s ). From our results, we propose the 
following empirical equation. 


/flue 


n f [Wo) 
N s V N,’ 


(16) 


i.e. a constant (a > 0) plus a term related to the number of corre¬ 
lations. The second term in this equation decays quickly to zero 
as N s increases and it provides a strong cutoff values on data with 
few epochs. fj < a 2 Nf I,n since the /a uc for 1 must be greater 
than 0 for any number of correlations. Large values of a give a 



tau (days) 


Fig. 3. Histogram of the interval between observations for the WFCAM- 
CAL08B data, tau is the interval between any 2 observations regardless 
of filter and these are binned logarithmically. The peaks at ~ 10 -3 days 
are intervals during a ZYJHK sequence and the peaks at ~ 1 day and 
multiples of lday are repeat observations on subsequent nights. 


more complete selection while smaller values result in a more 
reliable sample. Figure [T| shows the number of sources selected 
using Eqn. IT6l for s - 2 (black lines) and .v = 3 (grey lines) as 
a function of a values. Solid lines mark the number of sources 
of WVSC1 stars, while the dashed lines mark the efficiency of 
selection (ratio of total number of sources to number of known 
variables). The total number of sources is normalization by the 
total number of WVSC1 stars (319) to give an efficiency metric 
(E tot ). Table [2] shows the number of total sources selected ( E tot ) 
and the fraction of WVSC1 stars (Ewvsci) f° r some a values. 
For instance, to select about 90% of WVSC1 stars we need a 
sub-sample of 3.77 x 319 stars using a = 0.30 for s = 2. On 
the other hand to select about 92% of WVSC1 stars we need a 
sub-sample of 3.71 x 319 stars using a = 0.48 for s - 3. 

4.3. AT estimate and correlated observations 

The variation of variability indices with AT will depend on many 
factors such as: the variability period (P), the shape of the light 
curve, the signal-to-noise, and outliers. In order to estimate the 
influence of AT we simulate a pure sinusoidal variation with 
a period (P). Next, we compute the K ( ^ indices and see how 
changing AT as a function of P affects how well we can separate 
a sinusoidal signal from random noise. 

Fig. [2] shows the indices as a function of AT/P. K (2> 
decreases quickly to the expected values for random variations 
when s — 2, while, when s — 3, remains higher than the 
expected noise value for all AT/P. This result helps us to under¬ 
stand and use AT values. For instance, if AT < 0.1P, we will 
get large values for K^, clearly separated from noise and thus 
detect variability more easily. 

A T will be determined predominantly by the cadence of 
the data. For the WFCAMCAL data, a sequence of 3 to 5 fil¬ 
ters were observed over a period of ~ 0.005 day, and then each 
pointing reobserved roughly 1 day later, with longer intervals 
because of weather or seasonal limits on the observations, see 
Sect l2.ll This is displayed in Fig [3 which shows the histogram 
of time between subsequent observations. There is a strong peak 
r ~ 10~ 3 day which corresponds to a ZYJHK sequence with du¬ 
ration of ~ 0.005 day and a second peak at ~ 1 day, and a variety 
of smaller peaks at other durations, often a few days apart (bad 
weather, non-photometric nights) and a few small ones at long 
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Table 2. Efficiency metric for some a values. 


s = 2 

s = 3 

a 

Ewvsci 

Ewt 

a 

Ewvsci 

Etot 

0.20 

0.57 

1.72 

0.20 

0.31 

0.65 

0.22 

0.66 

1.90 

0.24 

0.41 

0.79 

0.24 

0.72 

2.18 

0.28 

0.50 

0.96 

0.26 

0.78 

2.50 

0.32 

0.64 

1.14 

0.28 

0.84 

3.06 

0.36 

0.71 

1.42 

0.30 

0.90 

3.77 

0.40 

0.80 

1.79 

0.32 

0.93 

4.83 

0.44 

0.88 

2.44 

0.34 

0.95 

6.70 

0.48 

0.92 

3.71 

0.36 

0.99 

9.58 

0.52 

0.95 

5.86 

0.38 

0.99 

13.96 

0.56 

0.98 

9.87 


durations of tens or hundreds of days (field not observed because 
it was too close to the Sun) or on timescales of 0.01 to 0.1 days: 
days when many calibration fields were taken for test purposes. 
The best choice of AT should be the minimum duration that en¬ 
closes the correlated observations, upto the end of the last inte¬ 
gration. AT - 0.01 day is a sensible choice as it is slightly wider 
than the typical box size so no observation is missed and allows 
us to look for variables with P > 0.1 day, although the main sam¬ 
pling peak at ~ 1 day may be expected to limit us to P > 0.5 days 
from the Nyquist frequency. Since the sampling is not rigidly at 
1 day intervals shorter periods are possible. Having correlated 
sampling at more than 20 times the frequency of the main sam¬ 
pling rate will avoid additional constraints being applied to the 
period range. 

However, if we have equally spaced data, we are severely re¬ 
stricted. If we have s - 2 correlations and a spacing of r, then 
AT > 2t and P > 20r, if s = 2. Given that at least 2 full periods 
are required for a confident identification of periodic behaviour, 
at least 40 observations would be required to constrain a very 
narrow range of periods. Thus, correlation indices become very 
inefficient for equally spaced data. Some deep extragalactic sur¬ 
veys have observations designed to maximise depth, so the ob¬ 
servations are taken when seeing and sky levels are at the best, 
so the observation structure can be pseudo-correlated, but not on 
fixed time scales. In Paper 2 of this series we will discuss indices 
which work better with uncorrelated observations. 

A correlated data set may be expected to have at least half 
of the t values in a peak or small set of peaks (if several filters 
with slightly different exposure times) at around the correlation 
frequency and then the main sampling peaks at r samp > 20r cor . 

When we co nsider VISTA surveys, e.g. the VVV 
(iMinniti et al.ll2010b . the data are obse rved as pawprints, which 
then get co-added into tiles ICross et akl 1201 2l so there are al¬ 
ways repeat observations on a short time-scale compared to re¬ 
peat epochs. These observations are also observed almost con¬ 
temporaneously, with some tiling patterns jumping between dif¬ 
ferent the same jitter on different pawprints before moving onto 
the next jittefl so the time between pawprints will usually be 
shorter than the integration time of the pawprint, so these are 
ideal for correlated indices applied to the pawprints. 

Gaia dBailer-Jones et al.1l2013l) is another mission where cor¬ 
related indices will be extremely valuable. The main astrometric 
instrument observes stars as they transit across 9 strips of detec¬ 
tors with r ~ 5s. Stars are then reobserved by a second field of 
view 2h later or on the next revolution 6h later or on longer time 
scales due to the orbit and precession of the spacecraft. 


1 http://casu.ast.cam.ac.uk/surveys-projects/vista/technical/tiles 



1 10 100 1000 


Fig. 4. Histograms of the number of correlations N s for s = 2 and s- = 3 
using bins of width 1. 


5. Data analysis 

5.1. Broad selection and Bias 


From Sects. |4~T114.21 and 14.31 we can determine the main con¬ 
straints on variability analysis. We consider that there is at least 
one incorrect correlation measurement rif = 1 in each LC there¬ 
fore we limit our analyses to sources with more than four cor¬ 
relation measurements, according to Eqn [14] This is the mini¬ 
mum number of correlation measurements adopted in our anal¬ 
ysis. Moreover, we adopted AT - 0.01 days, based on the du¬ 
ration of the ZYJHK sequences. By following these constraints, 
we are considering all LCs that can possibly discriminate a cor¬ 
related signal from noise with periods of at least greater than 
0.1 days (see Sect. 14.3b We revisited the WFCAMCAL data in¬ 
stead of testing these variability indices using simple sinusoidal 
light-curves, since this gives a more realistic test with correlated 
observations, real noise values, and a range of variable types. 


Next, we compute the K&, L v f c , M ( ^, FL (s \ and FM (l) vari¬ 
ability indices using a multi-waveband approach, as discussed in 
Sect. l3.2l on the data described in Sect. 12.II Fig.|4]shows the his¬ 
togram of the number of correlation N s ranging from 1 to 1467 
for s = 2, and from 1 to 863 for s - 3. These numbers are dif¬ 
ferent because the ZYJHK measurements are obtained within 
a few minutes of each other but not necessarily in all filters. 
Additionally, the number of correlation measurements decreases 
quickly for very faint objects around the detection threshold. The 
total baseline varies from a few months up to three years and 
the cadence in a single passband can be considered to^be quasi¬ 
stochastic with rather irregular gaps (see Hodgkin et al.1120091: 


ICross et akll2009t iFerreira Lones et ail 20151 for a better discus- 

sion). 


5.2. Searching for periodic variations 

To sea rch for the be st per iod a Lomb-Scargle periodogram 
dL^bllT976l 1 carglel fl~982h was computed for each LC. We 
set the low-frequency limit (/o) for each periodogram to be 
/o = 2/7' 1() |days 1 , where 7' tIlt is the total time spanned by the LC. 
The high-frequency limit was fixed to fy = Xf = 10 days 1 , and 
the periodogram size was scaled to 10 5 elements. Initially, we 
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Fig. 5. Correlation variability indices versus K magnitude (left diagram) and versus number of correlations (right diagram) in each panel. The 
maximum number of sources per pixel is displayed in brackets in each panel. The black circles mark the WVSC1 sources and the solid and dashed 
lines marks the value which encloses 90% and 80% of them. 



compute the Lomb-Scargle periodogram independently for each 
broadband filter (Y , Z, J, H , and K) as well as for the chromatic 
light curve (as described in lFerreira Lopes et al.ll2015 ). The use 
of all broadband filters allows us to find variability periods in 
the cases where the photometry is high quality in some filters, 
but not others: e.g. in some filters the object may be saturated 


(sometimes leading to a non-detection at the correct location), 
or may be very faint, close to the detection limit or even too faint 
to be detected. 

For each broadband filter as well as for the chromatic 
light curve we retain the 10 periods corresponding to the 
highest peaks. Next these periods were refined following 
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[De Medeiros et all ( 2013t) . namely, by maximizing the ratio of 
the variability amplitude s to the minimum dispersion in the 
phase diagram given by [Dworetskvl (119831) . Finally, in order to 
select the very best period, we use the y 2 test, in the same way 
as described in [Ferreira Lopes et all2015l 



10 12 14 16 18 10 12 14 16 18 

K K 


Fig. 6. Histograms of the sources selected with a constant cutoff value 
for K\ j' 1 (black line), l}^ fc and FL (s) (red line), and and FM (s) (blue 
line) indices normalized for the total number of sources selected in each 
one of them. E tot values for each index is displayed in parentheses. The 
upper panel shows the histograms for s- = 2 while the lower panel shows 
them for s = 3. The cutoff values were determined considering a value 
that includes 90% of WVSC1 stars (see Table[3j. 


6. Results and Discussions 

We analyse the efficiency of variability indices for selecting vari¬ 
able stars in the WFCAMCAL database. We evaluate responses 
of these indices as a function of magnitude and the number of 
correlations. This study allows us to trace important remarks 
about the most efficient way to select variable stars. The most ef¬ 
ficient index is the one that encloses the majority of the WVSC1 
stars with the fewest stars which do not belong to the WVSC1 
catalogue and are mostly misclassifications. We compute the 
variability indices as described in Sect. 13.21 We detail our re¬ 
sults in an analysis of correlation variability indices that were 
computed using a panchromatic approach such as described in 
Sect. 13.21 Below we present our results using stars from the 
WVSC1 catalogue as a comparison. 

6.1. Efficiency of variability indices 

Fig[5]shows the distribution of K fl , L p f c , FL (s \ and FM' S) 

variability indices as a function of magnitude and the number of 
correlations N s , for s - 2 and s = 3. The solid and dashed lines 
are set to the values that must be adopted if we want to select 
90% and 80% of the final WVSC1 catalogue, respectively. Fig [6] 
shows the histogram of these indices as a function of magnitude 


for stars selected using the cutoff value that includes 90% of the 
WVSC1 catalogue.] 


- K 


(s') 

g presents a clear separation of WVSC1 stars from the 
other stars for K < 17 mag. The lines that appear for K > 17 
mag are due to the high number of sources with just a few 
epochs (typically N s < 20). This index produces discrete 
values and these are more evident for a small number of 
epochs. The right panel shows a higher dispersion for low 
numbers of N s as expected. Statistical fluctuations may pro¬ 
vide high contamination in this region despite the number of 
correlations being above the minimum number that allows 
us to discriminate them, according to Eqn[14] Kj) shows a 
similar behaviour although it displays a greater separation of 
WVSC1 stars from the other stars. 


- L 


(s) 


, fc is equivalent to two previous indices under some con¬ 
straints: it is equal /ws f° r s = 2 and equivalent to 


r(9 

pfc 


when the correlations are obtained in different fil¬ 
ters. Jws has b een used in the selection criteria for sev- 
eral su rveys (e.g. Christiansen et al. 2008; McCommasetalJ 


20091: Morales-Calderon et alJ 20 091: iBhatti et al] 201 (1 
Shappee & Stanekll201 lUPasternacki et al.ll201 ll). 


l ’ provides a higher selection efficiency than previous in¬ 
dices and it performs combinations among measurements 
in AT intervals rather than across wavelengths. Tv, indices 
present a clear separation of WVSC1 stars from other stars. 
If we assume a constant value of L, as a selection crite- 

pfc 

rion we observe that most of the non-variable stars selected 
are faint stars. The separation between WVSC1 and other 
stars is clearer for ,v = 3 than for s - 2. Indeed, the number 
of sources preselected to enclose 90% of WVSC1 stars are 
about 30% fewer for ,y = 3 than ,v = 2. Two main features that 
are expected with increasing s are observed: the increase in 
the number of stars with negative indices values and a better 
discrimination between variable and non-variable stars. 

- M J calculates the median of the correlation values in con- 


pfc 


trast to the mean encapsulated by L pf '. Both mean and me¬ 
dian values are used to determine the central or typical value 
in a statistical distribution. The weight of the outliers is re¬ 
duced in the median compared to the mean. Outliers in pho¬ 
tometric data are commonly associated with variations in 
brighter stars non-linearity and saturation. On the other hand, 
the increasing dominance of correlated noise is expected for 
faint stars as the errors become dominated by the sky noise, 
rather than photon statistics. Fig. [6] shows a underestimation 
of Lj r ) and MJ r indices implying on increase of misclassi- 


"“'pfc- / 


pfc 


fication. Indeed, the number of stars is higher of than 
Lp^, for faint stars and against for brighter stars. Fig. [6]shows 
an increase in the fractions of stars selected at both the bright 
and faint magnitudes for L fp ) and M fc indices, implying an 
increase in the misclassification rate. The misclassification 
rate is higher for faint stars when using M pfc than when us- 


ing L ic , and vice-versa for bright stars. Therefore stars that 
match both criteria should have a lower misclassification rate 
at both the bright and faint ends, so agreement between these 
indices should be considered as a selection criteria. 

Using larger s values gives less weight to outliers and 
partially-correlated noise, and this leads to a better estima¬ 
tions of the centre of the distribution for both the mean and 


Article number, page 9 ofO 


















































A&A proofs: manuscript no. varind_v00_062115 


WVSC-335 Period 0.404(d) 

17.0 

17.5 

18.0 
15.0 

15.5 

16.0 

12.3 

12.6 
12.9 

0 200 400 600 800 1000 12210 0.5 1.0 1.5 2.0 

time(days) phase 


1 1 i 11 ? 


1 f. ^ ■*! .* - 

: t 1 

.*.* 

$ * + it] 

i| 7 


it* inn 

$ 4] A 

- 

|ll §| 



0 200 400 600 800 1000 12210 0.5 1.0 1.5 2.0 

time(days) phase 



WVSC-337 


12.32 
12.40 
12.48 
12.18 
12.25 
12.32 
11.92 
12.00 
12.08 


Period 0.1 11 (d) 


WVSC-338 


+ + 

- ++ + 

- ++ + + * + + - 
o. + 

- +4 % 44 \ 

- . * + : 

* + + * + + - 
4 ++ + ++++ + r 

-** * $ 
J, * * * 

L*** 

* * * 

* *** * *** 

; # 

5 , * ^ - 

C * 

y, * w 7K- 

*IL * * : 

O ' ' ' ' 

- # 

-9 <>* % J 

. « - 

■—$ 1 1— $ 

«■ 

€> $ 

«•-. ® & " 

^A A A A A 

a & 

k A ' 

# * A * A 

-A a '% a 4 r 

# m | 

- 1 e m 

mE mm mi 

.Gill 111 m 

® B b - ’ ® B b '" 1 * 

l m S 8 » a “B H ® » $ 

m E m E : 

isEff a ^ _ am® m “5 _ " 

2 m he m wm 


12 .' 

12.52 p- 
12.56 
12.60 

12.48 

12.52 


0 200 


400 600 800 0.0 0.5 1.0 1.5 2.0 

time(days) phase 


12.24 r 
12.27 - 
12.30 r 
12.00 p 
12.03 r 
12.06 r 
12.09 t- 
1 1.96 r 
12.00 r 
12.04 k. 

0 


A A 
A 


Period 0.1 79(d) 




^ V 


Er E ^ U E 


200 400 600 

time(days) 


0.5 1.0 1.5 2.0 

phase 


Fig. 7. LCs and phase diagrams of Cl catalog. The identifiers and periods are displayed above each panel. 


median. Therefore, the efficiency of L 2 and M pfc will in¬ 
crease and they tend to have similar E tot values for different 
indices for higher 5 especially if sources with small numbers 
of correlations are removed, as observed in Table [2for ,y = 3 
and N s > 20. Meanwhile, the K f] are the best indices to 
perform a selection of variable stars, when we only consider 
higher s values as well as only those sources with N s > 20 
(see Table 0. 

- FL (s) ) and FM (s) provide better efficiency values (E tot ) 
among those indices computed from correlation magnitudes 
(see Table [3}. The F (s) factor provides a concentration of 
non-variables with values around zero as well as a reduc¬ 
tion in the spread of bright sources. We observe a reduction 
of more than 300% on the number of sources pre-selected by 
L|4 and MpJ.’ indices for s - 2 and about 20% for s - 3 (see 
Fig. 0. The large reduction is not found for s - 3 because 
these indices become more accurate with increasing s values 
and this therefore decreases the weight of F (s K On the other 
hand, only a slight decrease in E rot was observed when we 
use as correction factor TTws/0.789 (see Table [3]). Such fac¬ 
tor is used to build the Lws Stetson index dStetsonll 19961) that 
can be expressed by Lws ~ L 7 x Wws/0.789. 


Table 3. Efficiency metric ( E tot ) for variability indices analysed to select 
90% and 80% of WVSC1 stars forlV (l) > 4 and N s) > 20, respectively. 



N s 

> 4 

N s 

> 20 

Index 

E, ot { 90%) 

E tot ( 80%) 

E to ,( 90%) 

EU 80%) 


12.6 

8.8 

4.7 

2.9 

4 1 

8.9 

5.2 

3.0 

1.7 

L® 

prc 

40.1 

21.5 

27.3 

14.7 

4c 

14.7 

8.8 

8.6 

5.5 

mP 

pic 

65.5 

29.7 

48.4 

20.0 


15.0 

9.9 

7.4 

4.9 

fl (2) 

24.0 

13.2 

14.6 

7.9 

ll (3) 

12.0 

7.4 

6.7 

4.4 

fm (2) 

27.6 

15.1 

16.7 

8.9 

lm (3) 

13.8 

8.5 

6.5 

4.0 

Lws 

38.4 

20.1 

25.6 

13.2 


Summarizing, the correlation variability indices discriminate 
between uncorrelated and correlated data that is a typical feature 
of variable stars. We can enclose almost all WVSC1 stars in a 
sample with fewer than about 1500 sources. These indices still 
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Fig. 8. Examples of LCs shown intrumental bias. The identifiers are displayed above each panel. 


present a low efficiency for discrimination at the faint end or 
bright end. The root cause in each case may well be different: 
excess high values for bright sources are due to temporal satura¬ 
tion and few epochs and increases in measurement uncertainties 
for the faint stars which makes the variability indices more sen¬ 
sitive to statistical fluctuations and systematics. 

Figure[6]shows the histograms of sources selected for a con¬ 
stant cutoff value for the higher ranked indices. These indices 
return most of the WVSC1 stars with fewer non-variable stars 
(see Table |3). However, these indices have a clear bias on se¬ 
lection from the point of view of magnitude since they are not 
evenly distributed along all K values. For s = 2we observe that 
K ( ^ (black line) and F l} s) (red line) display a prominent over¬ 
selection for faint stars while FM (s) (blue line) is biased for both 
brighter and faint stars. On the other hand, the three indices have 
similar bias for s = 3. Such a result indicates that the efficiency 
of these indices may be similar for higher s values since the dif¬ 
ference in E, ot between them is smaller for s - 3. Indeed, more 
than 60% of stars with k > 17.5 have N s < 20. This low detec¬ 
tion efficiency for the instrument in this region gives a maximum 
magnitude limit where we can sensibly use these indices. 

On the other hand, if we use the analytical expression for 
/fluc.s (see Sect. 14.21) we obtain a higher efficiency. This function 
allows us to analyse stars with lower N s values (of course, above 
the minimum number of correlations N s > 4) with a similar ef¬ 
ficiency such as that obtained for FL <S> and FM (sl considering 
a more strict selection, i.e. N s > 20. Using f/i uc ,s we can en¬ 
close a greater number of WVSC1 stars with fewer contaminat¬ 
ing sources selected (see Table [2}. ffi uc>s is an empirical relation 
and it may be adapted according to its purpose. 

6.2. Searching for Variable Stars 

We use the a values (see Sect l4.2l > in order to select at least 90% 
of the WVSC1 stars. Therefore, we adopt a — 0.30 for s — 2 
and a = 0.46 for s - 3 which return a combined sample of 1133 
variable stars candidates that were not included in the WVSC1 
catalogue. The periods were computed according to Sect. l5.2l and 
we visually inspected each star. According to our analysis these 
stars can be divided in to five main groups: (a) variable stars mea¬ 
sured in few epochs; (b) variable stars with low signal-to-noise 
and low confidence periods; (c) variable stars with amplitudes 
which are near to the noise level; (d) aperiodic variable stars or 
variables of such long periods that these data were insufficient 


for deriving them; (e) false variables due to instrumental or re¬ 
duction problems. 

Our procedure has resulted in a catalogue with four new 
sources (Cl). Fig.Qshows the Cl stars which maybe included in 
abc groups with variability indices’ values near to those expected 
from noise. Fig. [8] show some instrumental variations that can 
appear to give false positives for variable stars: false variables 
due to instrumental saturation (left and middle panels) and for 
data reduction problems (right panel). The separation between 
stars of abc and de groups is not possible using only variability 
indices. Discriminating these sources using statistical analyses 
will be discussed in a forthcoming paper. Table [4] lists coordi¬ 
nates, periods, mean magnitudes, and the number of epochs in 
each filter for this sample. WVSC-336 has a period of 192 days 
but its period may be higher since we don’t observe a complete 
variability cycle. 

6.3. Two-dimensional View of Correlated Data 

The /fl uc defined for variability indices, using the expres¬ 
sion in Eq.[l6] presents the best efficiency for selecting WVSC1 
stars (see Tables[2]and[3}. However it returns a lot of false posi¬ 
tives when we have few correlations. The combination of indices 
based on correlation signals (K u ) with those based with corre¬ 
lation values ( FLf s ' ) and FM <S> ) may provide two-dimensional 
view of correlated data. 

Fig. [9] shows the FL (s \ FM (S \ and logarithmic of x 
N s as a function of K^ } (named KFLs diagram) for s - 2 (left 
panels) and s - 3 (right panels). The KFLs diagrams allow us 
discriminate two main groups; (Gl) faint stars where about 90% 
have K' t] ' 1 = 1 due to a small number of correlations; (G2) is 
composed of stars with K < 16.5 that includes 91% of WVSC1 
stars. These groups display a clear separation if you multiply 
F L (s) by the number of correlations (N s ) where Gl has logFL (s) x 
N( S ) <1.5 and G2 is delimited by logFL (s) xN s > 1.5. However, 
the last diagram is biased for N s and so comparisons between 
different sources are difficult to make. 

Stars included in G2 with low values of are mainly 
bright, saturated stars showing a low level of variability and false 
positive variations due to instrumental bias. The boundary be¬ 
tween WVSC1 and other stars is not well defined in this diagram. 
Nevertheless, the KFLs diagram helps us enclose about 90% of 
WVSC1 stars with E ro , ~ 2 in G2. 
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Fig. 9. FI} S \ FM ( '\ and logarithmic FL w x N s versus the variability indices for stars selected by /fj uc expression using a = 0.30 for s = 2 
and a = 0.46 for s = 3 (see Sect. 16.2t for .5 = 2 (left panels) and 5 = 3 (right panels). The WVSC1 stars are marked by open black circles and the 
colours are set by the K magnitude. Objects marked by crosses do not have a K-band magnitude, either being too faint, or saturated. They have 
measurements in other bands. 



Table 4. Periodic objects in the WFCAM Variable Star Catalog (Cl). 


ID [WSA] 

ID [WVSC] 

RA [deg.] 

DEC [deg.] 

p[d] 

<Z> 

<r> 

(j) 

{H) 

(K) 

Nz 

Ny 

Nj 

N„ 

N k 

858994169008 

WVSC-335 

+277.0394050 

+ 1.7390500 

0.40396 

15.903 

15.573 

15.076 

14.593 

14.368 

80 

83 

97 

94 

98 

858994205031 

WVSC-336 

+277.4906120 

+ 1.2348900 

192 

17.228 

15.128 

12.294 

-9.999 

-9.999 

68 

1 1 

8 

0 

0 

858994439420 

WVSC-337 

+ 104.8895590 

-4.9367370 

0.111042 

12.409 

12.250 

12.017 

11.822 

11.725 

25 

21 

23 

21 

32 

858994483642 

WVSC-338 

+ 129.0526690 

-10.2269770 

0.178714 

12.531 

12.490 

12.275 

12.039 

11.984 

11 

10 

11 

11 

11 


7. Conclusions 

From our results we can conclude that: the analysis of databases 
with fewer than 4 correlated measurements is not possible using 
K^ ] and related indices (using factor F ls) ) when we consider, 
the case of one wrong value. In these cases we may use Lv. 
or M t indices to discriminate correlated and uncorrelated data. 

pic 

On the other hand, when we have enough correlation measure- 
ments, the K u variability indices provide unique features to do 
time domain analysis that allows us to define a general way that 
can be applied to any survey with correlated epochs: it presents a 
low sensitive to outliers, does not undergo strong variations with 
magnitude, it has a clear interpretation and a theoretical defini¬ 
tion of a value expected for noise, it has a well defined range of 
values from 0 to 1, and it is not dependent on error bars. There¬ 
fore it may be used as a universal method to select correlated 
variations. Moreover, KFLs diagrams displays two unique vari¬ 


ability features related to intensity and number of positive cor¬ 
related measurements which allow us to improve E tot by at least 
40% (see Sect. l6.3l i. 

The FL <5> and FM ,S> values in the KFLs diagrams (see 
Fig. |9) may vary for different surveys. However the K^ ] is not 
strongly dependent on instrumental features and its cutoff values 
can be adopted as universal values as can /fl uc ,s- Its values are re¬ 
lated with the discrimination of correlated and uncorrelated data 
and its response is unbiased with respect to magnitude or ob¬ 
served wavelength. After selecting the variable star candidates 
using /fl uc we may use the KFLs diagrams to improve the selec¬ 
tion. Next we can remove G1 stars and use levels of significance 
of some periodicity methods to discriminate which of these dis¬ 
play periodic variations. 

This work is the first in a series that make a detailed analysis 
of all processes of variable photometric data. In this first paper 
we have investigated which indices give the most efficient selec¬ 
tion when we have correlated observations. In the second paper 
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of this series we will consider uncorrelated observations and de¬ 
termine which are the best indices for selecting variable stars. In 
the coming years we will apply these methods to very large sur¬ 
veys of the Milky Way, e.g. VVV, PanSTARRS, Gaia and in the 
longer term to LSST to provide fast and reliable classifications of 
variable stars within the Milky Way, which will improve our un¬ 
derstanding of the evolution of different stellar populations and 
thus the formation of structures within our Galaxy. 

8. Acknowledgements 

C. E. F. L. acknowledges a post-doctoral fellowship from the 
CNPq. N. J. G. C. acknowledges support from the UK Science 
and Technology Facilities Council. 


Scargle, J. D. 1982, ApJ, 263, 835 

Schwarzenberg-Czemy, A. 1996, ApJ, 460, L107 

Shappee, B. J. & Stanek, K. Z. 2011, ApJ, 733, 124 

Shin, M.-S., Sekora, M„ & Byun, Y.-I. 2009, MNRAS, 400, 1897 

Stellingwerf, R. F. 1978, ApJ, 224, 953 

Stetson, P. B. 1996, PASP, 108, 851 

Suveges, M. 2014. MNRAS, 440, 2099 

Templeton, M. 2004, Journal of the American Association of Variable Star Ob¬ 
servers (JAAVSO), 32, 41 
Udalski, A. 2003, Acta Astron., 53, 291 
Walkowicz, L. M. & Basri, G. S. 2013, MNRAS, 436, 1883 
Welch, D. L. & Stetson, P. B. 1993, AJ, 105, 1813 
Wozniak, P. R. 2000, Acta Astron., 50, 421 

Wozniak, P. R„ Williams, S. J., Vestrand, W. T„ & Gupta, V. 2004, AJ, 128, 2965 
Yuan, F. & Akerlof, C. W. 2008, ApJ, 677, 808 
Zechmeister, M. & Kurster, M. 2009, A&A, 496, 577 


References 

Alard, C. & Lupton, R. H. 1998, ApJ, 503, 325 

Baglin, A., Auvergne, M., Barge, P., et al. 2007, in American Institute of Physics 
Conference Series, Vol. 895, Fifty Years of Romanian Astrophysics, ed. 
C. Dumitrache, N. A. Popescu, M. D. Suran, & V. Mioc, 201-209 
Bailer-Jones, C. A. L., Andrae, R., Arcay, B., et al. 2013, A&A, 559, A74 
Baluev, R. V. 2009, MNRAS, 395, 1541 
Baluev, R. V. 2013, MNRAS, 436, 807 

Becker, A. C., Wittman, D. M., Boeshaar, P. C., et al. 2004, ApJ, 611, 418 
Bhatti, W. A„ Richmond, M. W„ Ford, H. C„ & Petro, L. D. 2010, ApJS, 186, 
233 

Blomme, J., Debosscher, J., De Ridder, J., et al. 2010, ApJ, 713, L204 
Borucki, W. J., Koch, D., Basri, G., et al. 2010, Science, 327, 977 
Christiansen, J. L., Derekas, A., Kiss, L. L., et al. 2008, MNRAS, 385, 1749 
Cincotta, P. M., Mendez, M., & Nunez, J. A. 1995, ApJ, 449, 231 
Clarke, D. 2002, A&A, 386, 763 

Corwin, T. M., Sumerel, A. N., Pritzl, B. J., et al. 2006, AJ, 132, 1014 
Cross, N. J. G., Collins, R. S„ Hambly, N. C., et al. 2009, MNRAS, 399, 1730 
Cross, N. J. G., Collins, R. S„ Mann, R. G„ et al. 2012, A&A, 548, Al 19 
Damerdji, Y„ Klotz, A„ & Boer, M. 2007, AJ, 133, 1470 
De Medeiros, J. R., Ferreira Lopes, C. E., Leao, I. C., et al. 2013, A&A, 555, 
A63 

Debosscher, J., Sarro, L. M., Aerts, C., et al. 2007, A&A, 475, 1159 
Deeming, T. J. 1975, Ap&SS, 36, 137 

Dubath, P, Rimoldini, L., Suveges, M., et al. 2012, VizieR Online Data Catalog, 
741,42602 

Dupuy, D. L. & Hoffman, G. A. 1985, International Amateur-Professional Pho¬ 
toelectric Photometry Communications, 20, 1 
Dworetsky, M. M. 1983, MNRAS, 203, 917 

Eyer, L. 2006, in Astronomical Society of the Pacific Conference Series, Vol. 

349, Astrophysics of Variable Stars, ed. C. Aerts & C. Sterken, 15 
Ferreira Lopes, C. E., Dekany, I., Catelan, C., et al. 2015, A&A, 573, A100 
Gossl, C. A. & Riffeser, A. 2002, A&A, 381, 1095 

Hambly, N. C„ Collins, R. S„ Cross, N. J. G„ et al. 2008, MNRAS, 384, 637 
Hodgkin, S. T„ Irwin, M. J„ Hewett, P. C„ & Warren, S. J. 2009, MNRAS, 394, 
675 

Hoffman, D. I., Harrison, T. E., & McNamara, B. J. 2009, AJ, 138, 466 
Huber, D„ Ireland, M. J„ Bedding, T. R., et al. 2012, ApJ, 760, 32 
Irwin, M. J., Lewis, J., Hodgkin, S., et al. 2004, in Society of Photo-Optical 
Instrumentation Engineers (SPIE) Conference Series, Vol. 5493, Optimizing 
Scientific Return for Astronomy through Information Technologies, ed. P. J. 
Quinn & A. Bridger, 411^422 

Kaiser, N., Aussel, H., Burke, B. E., et al. 2002, in Society of Photo-Optical 
Instrumentation Engineers (SPIE) Conference Series, Vol. 4836, Survey and 
Other Telescope Technologies and Discoveries, ed. J. A. Tyson & S. Wolff, 
154-164 

Lafler, J. & Kinman, T. D. 1965, ApJS, 11, 216 

Lawrence, A., Warren, S. J., Almaini, O., et al. 2007, MNRAS, 379, 1599 
Lomb, N. R. 1976, Ap&SS, 39, 447 

McCommas, L. P., Yoachim, P, Williams, B. F., et al. 2009, AJ, 137, 4707 
Minniti, D., Lucas, P. W., Emerson, J. P., et al. 2010, New A, 15, 433 
Morales-Calderon, M., Stauffer, J. R., Rebull, L., et al. 2009, ApJ, 702, 1507 
Pasternacki, T., Csizmadia, S., Cabrera, J., et al. 2011, AJ, 142, 114 
Paz-Chinchon, F., Leao, I. C., Bravo, J. P, et al. 2015, ArXiv e-prints 
Pojmanski, G. 2002, Acta Astron., 52, 397 

Pollacco, D. L., Skillen, I., Collier Cameron, A., et al. 2006, PASP, 118, 1407 
Renner, S., Rauer, H., Erikson, A., et al. 2008, A&A, 492, 617 
Richards, J. W„ Starr, D. L„ Butler, N. R„ et al. 2011, ApJ, 733, 10 
Sarro, L. M., Debosscher, J., Lopez, M., & Aerts, C. 2009, A&A, 494, 739 


Article number, page 13 ofl 1 31 


